Computer simulations of language change notes

This website collects my personal notes on Computer simulations of language change. These notes are provided to bring full transparency to my research process. Of course, since they are only notes, they do not reflect my final thoughts on a topic, and should not be interpreted as such. To read finished papers, please consult my website. Do not use these notes as a basis for your own scientific research. Start from high-quality, peer-reviewed scientific literature instead.

Language, Usage and Cognition

A usage-based perspective on language

Domain-general processes

As mentioned above, a consequence of viewing language as a complex adaptive system and linguistic structure as emergent (Lindblom et al. 1984, Hopper 1987) is that it focuses our attention not so much on linguistic structure itself, as on the processes that create it (Verhagen 2002). By searching for domain-general processes, we not only narrow the search for processes specific to language, but we also situate language within the larger context of human behaviour. (Bybee 2010, 7)

cognitive processes in this book

categorisation
chunking
rich memory storage
analogy
cross-modal association

(p. 7)

1. categorisation

the most pervasive of these processes
interacts with the others
“the similarity or identity matching that occurs when words and phrases and their component parts are recognised and matched to stored representations”
“resulting categories are the foundation of the linguistic system”

Categorization is domain-general in the sense that perceptual categories of various sorts are created from experience independently of language.

2. chunking

“the process by which sequences of units that are used together cohere to form more complex units”
“chunking is basic to the formation of sequential units expressed as constructions, constituents and formulaic expressions”
“repeated sequences of words (or morphemes) are packaged together in cognition so that the sequence can be accessed as a single unit”

It is the interaction of chunking with categorization that gives conventional sequences varying degrees of analysability and compositionality.

3. rich memory

“the memory storage of the details of experience with language, including phonetic detail for words and phrases, contexts of use, meanings and inferences associated with utterances”
“memory for linguistic forms is represented in exemplars, which are built up from tokens of language experience that are deemed to be identical”

(p. 8)

4. analogy

“the process by which novel utterances are created based on previously experienced utterances”

Usage-based grammar

construction (p. 9)

a direct form-meaning pairing
has a sequential structure
may include positions that are fixed as well as positoins that are open

syntax and semantics (p. 9)

interwoven → not two separate things!

The levels of abstraction found in a usage-based grammar are built up through categorization of similar instances of use into more abstract representations (Langacker 1987, 2000) (p. 9)

evolution of constructions (p. 9)

the evidence that specific instances of constructions impact representation includes the fact that these instances can change gradually into new, independent constructions, through repetition (Chapters 2, 6 and 8)
in an exemplar model, all variants are represented in memory as exemplar clusters
- such clusters can change gradually, representing the changes that language undergoes as it is used

Change is postulated to occur as language is used rather than in the acquisition process (Chapters 6, 7 and 8). (p. 9)

Sources of evidence

In usage-based theory, where grammar is directly based on linguistic experience, there are no types of data that are excluded from consideration because they are considered to represent performance rather than competence. Evidence from child language, psycholinguistic experiments, speakers’ intuitions, distribution in corpora and language change are all considered viable sources of evidence about cognitive representations, provided we understand the different factors operating in each of the settings that give rise to the data. (p. 10)

… it should come as no surprise that much of the argumentation is based on examples that demonstrate tendencies in language change (p. 10)

Rich memory for language: exemplar representation

Introduction

Contrast with the parsimonious storage of generative theory and its structuralist precursors

The structuralist tradition

structural / generative traditions (p. 15)

firmly committed to the idea that redundancies and variation are extracted from the signal → discarded!

Langacker 1987 argues that a necessary prerequisite to forming a generalization is the accumulation in memory of a set of examples upon which to base the generalization. Once the category is formed or the generalization is made, the speaker does not necessarily have to throw away the examples upon which the generalization is based. (p. 15)

Exemplar models in phonology

The reducing effect of frequency

A robust finding that has emerged recently in quantitative studies of phonetic reduction is that highfrequency words undergo more change or change at a faster rate than lowfrequency words. Highfrequency words have a greater proportion of consonant deletion in the case of American t/ddeletion (Gregory et al. 1999, Bybee 2000b) as well as in Spanish intervocalic [ð]-deletion (Bybee 2001a). Unstressed vowels are more reduced in highfrequency words, as shown in Fidelholtz 1975 for English and Van Bergem 1995 for Dutch, and are more likely to delete (Hooper 1976). (p. 20)

Words that are used more often are exposed to the bias more often and thus undergo change at a faster rate. The leniting bias is a result of practice: as sequences of units are repeated, the articulatory gestures used tend to reduce and overlap. (p. 20)

Exemplar models provide a natural way to model this frequency effect (an early proposal is found in Moonwomon 1992). If the phonetic change takes place in minute increments each time a word is used and if the effect of usage is cycled back into the stored representation of the word, then words that are used more will accumulate more change than words that are used less. Such a proposal depends upon words having a memory representation that is a phonetic range, that is, a cluster of exemplars (Bybee 2000b, 2001, Pierrehumbert 2001), rather than an abstract phonemic representation. (p. 20)

Morphology

The conserving effect of token frequency

Exemplar models allow a natural expression of several effects of high token frequency: because exemplars are strengthened as each new token of use is mapped onto them, high-frequency exemplars will be stronger than low-frequency ones, and high-frequency clusters – words, phrases, constructions – will be stronger than lower frequency ones. (p. 24)

↓ effects (p. 24)

stronger exemplars are easier to access, thus accounting for the well-known phenomenon by which highfrequency words are easier to access in lexical decision tasks
high-frequency, morphologically complex words show increased morphological stability

↳ morphological stability

frequent forms resist regularizing or other morphological change
⇒ irregular inflectional forms tend to be of high frequency

Assuming that regularization occurs when an irregular form is not accessed and instead the regular process is used, it is less likely that high-frequency inflected forms would be subject to regularization. (p. 25)

The more frequent of the members of a paradigm tends to serve as the basis of new analogical formations; thus the singular of nouns is the basis for the formation of a new plural (cow, cows) rather than the plural serving as the basis for a new singular (kine [the old plural of cow] does not yield a new singular *ky). Similarly, the present form serves as the basis for a regularized past and not vice versa. (p. 25)

Syntax

Evidence for an exemplar representation for constructions

Second, consider the way new constructions arise. New constructions are specific exemplars of more general existing constructions that take on new pragmatic implications, meanings, or forms due to their use in particular contexts. (p. 28)

Conclusion

p. 31-32

rich memory representation

evidence is found across all levels of grammar

↓

phonetics

phonetic details are part of a language user’s knowledge of his or her language

morphology

frequency (= exemplar strength) is important to morphological structure and change
specific instances of constructions have representations that can be accessed for analogical extensions/the creation of new constructions

speaker’s experience

exemplar models allow the direct representation of both variation and gradience

↓

phonetic variation

represented directly
allows a means of implementation of gradual sound change

morphologically complex words

can vary in frequency or strength of representation
each can have its own degrees of compositionality and analysability

syntax

differences in frequency of specific exemplars of constructions can lead to the loss of compositionality and analysability
⇒ eventual, gradual creation of new construction

Other implications of exemplar representation for constructions are discussed in subsequent chapters.

Chunking and degrees of autonomy

Introduction

Chunking

chunk (p. 34)

“a unit of memory organisation”
“formed by bringing together a set of already formed chunks in memory and combining them together into a larger unit” (Newell 1990)
chunking = “the ability to build up such structures recursively, thus leading to a hierarchical organization of memory”
TODO ook verwijzen naar literatuur uit de psychologie! → Lien!

↳ repetition (p. 34)

the principal experience that triggers chunking
if two or more smaller chunks occur together with some degree of frequency, a larger chunk containing the smaller ones is formed

Note that repetition is necessary, but extremely high frequency in experience is not. Chunking has been shown to be subject to the Power Law of Practice (Anderson 1982), which stipulates that performance improves with practice but the amount of improvement decreases as a function of increasing practice or frequency. Thus once chunking occurs after several repetitions, further benefits or effects of repetition accrue much more slowly. (p. 34)

In general experience as well as in language, it is usually the case that the larger the chunk, the less often it will occur. (p. 35)

While language users constantly acquire more and larger chunks of language, it is not the case that in general the language acquisition process proceeds by moving from the lowest level chunks to the highest. Even if children start with single words, words themselves are composed of smaller chunks (either morphemes or phonetic sequences), which only later may be analysed by the young language user. In addition, however, children can acquire larger multiword chunks without knowing their internal composition (Peters 1983) (p. 35)

All sorts of conventionalized multiword expressions, from prefabricated expressions to idioms to constructions, can be considered chunks for the purposes of processing and analysis. (p. 35)

status of a chunk in memory (p. 36)

falls along a continuum!

↓

no chunk

words that have never been experienced together

↕

weak chunk

words that have been experienced together only once and fairly recently
internal parts are stronger than the whole

↕

frequent chunk

e.g. lend a hand or pick and choose
easily accessible as wholes, while still maintaining connections to their parts

On the high-frequency end of the continuum, chunks such as grammaticalizing phrases or discourse markers do lose their internal structure and the identifiability of their constituent parts; see section 3.4.2 for discussion. (p. 36)

↳ source of constructions

chunking + categorisation
we know this from the fact that they evolve sequences of units + that they have at least one schematic category

The reducing effect of frequency

Reduction of words in context

phonetic reduction (p. 37)

occurs earlier and to a greater extent in high-frequency words than in low-frequency one

In addition, we must note that words that are used more often in a context favourable to reduction will also undergo more reduction. (p. 37)

Causes of reduction

factors (p. 38)

word frequency
frequency in context
predictability from surrounding words

Differential reduction within high-frequency chunks

differential reduction within chunks (p. 43)

according to how frequently the subparts occur together
fusion of going to into gonna [gənə] is due to the fact that this sequence is invariant in the grammaticizing phrase be going to

Autonomy: the structure and meaning of chunks

(p. 45)

compositionality	analysability
a semantic measure
the degree of predictability of the meaning of the whole from the meaning of the component parts (Langacker 1987)	‘recognition of the contribution that each component makes to the composite conceptualization’
gradient

Derived words can be compositional or not: compare hopeful, careful and watchful, which have fairly predictable meanings based on the meanings of the noun base and suffix, to awful and wonderful, which are less compositional since awful indicates a negative evaluation not present in the noun awe and wonderful indicates a positive evaluation not necessarily present in wonder.

(p. 45)

As we noted in Chapter 2, an idiom such as pull strings is not fully compositional in that it has a metaphorical meaning, but it is analysable, in the sense that an English speaker recognizes the component words, as well as their meanings and relations to one another and perhaps activates all this in the interpretation of the idiom. Similarly, compounds such as air conditioning or pipe cleaner are analysable in that we recognize the component words; however, as is well known, the interpretation of compounds is highly contextdependent and thus they are not usually fully compositional (Downing 1977).

(p. 45)

Frequency effects and morphosyntactic change

Changes in morphosyntactic analysability and semantic compositionality

Hay demonstrates through several experiments that the derived words that are more frequent than their bases are less compositional or less semantically transparent than complex words that are less frequent than their bases.

(p. 46)

Thus entice is more frequent than enticement; eternal is more frequent than eternally; top is more frequent than topless. However, there are also cases where the reverse is true: diagonally is more frequent than diagonal; abasement is more frequent than abase and frequently is more frequent than frequent. (p. 46)

Hay shows that simple token frequency does not correlate with the results of either experiment, as the claim in Bybee 1985 would predict. In Bybee 1985 I proposed that loss of analysability and semantic transparency were the result of the token frequency of the derived word. Hay has improved on this claim by showing the relevant factor to be relative frequency, at least at the frequency levels she studied. My suspicion is that at extremely high token frequencies, loss of analysability and transparency will occur independently of relative frequency. (p. 46, emphasis mine)

Processing morphologically complex word (p. 47)

full transparency

based directly on its component morphemes
especially if the complex is unfamiliar

single unit + transparency

still activates the morphemes that make it up

fully opaque

no activation of the component morphemes at all
would be the case if analysability has been lost for that word

Increasing autonomy

analysability loss (p. 48)

particularly likely in cases of extreme frequency increases
complex units may become autonomous from their sources, losing both internal structure and transparent meaning
but, not binary → a gradient

contributions to autonomy

👁 ↓

repeated direct access to complex sequences
phonetic reduction
pragmatic associations arising in contexts of use

Grammaticalization

grammaticalization (p. 50-51)

in cases of extreme frequency increases
compositionality and analysability are completely lost
a specific instance of a construction…
1. takes on new uses, gains in frequency
2. undergoes phonetic and semantic change
3. thereby begins to lose its compositionality and analysability

Autonomy and exemplar cum network model

Meaning and inference

Conclusion

Categorisation and the distribution of constructions in corpora

Introduction

Why constructions?

constructions (p. 76)

direct pairings of form with meaning (+ pragmatics!)
often have schematic positions that range over a number of lexical items
often contain explicit lexical material
- e.g. way, what, be doing

While everyone who works on constructions agrees that they cover everything from monomorphemic words, to complex words, to idioms, all the way up to very general configurations such as ‘the passive construction’ (because they are all form–meaning pairings), the term is usually applied to a morphosyntactically complex structure that is partially schematic. (p. 76)

a. Mr. Bantam corkscrewed his way through the crowd (Israel 1996).
b. What’s that box doing up there?
a. subject1 verb (manner of motion) poss pro1 way adverbial
b. What BE subject doing Y?

(p. 76-77)

we can note that it is not just the idiomatic portions of language that show a strong interaction of specific lexical items with grammatical structures. Even what must be regarded as fairly general syntactic structures, such as clausal complements, depend heavily upon the specific verb of the main clause. Thus think takes an ordinary finite clause (I think it’s going to snow) while want takes an infinitive clause (I want it to snow) and see takes a gerundial complement (I saw him walking along). The argument for constructions is that the interaction of syntax and lexicon is much wider and deeper than the association of certain verbs with certain complements. (p. 77)

construction-based grammar? (p. 77)

generative approaches are ill-equipped to specify the important, but often subtle differences in the distribution of constructions both language-internally and in a comparative perspective (Newmeyer 2005)

constructions and valence (p. 78)

a given verb may appear in a number of different constructions
verb itself cannot be relied upon to determine what arguments it might take → depends on the context
⇒ constructions

Why constructions? (p. 78)

Constructions are particularly appropriate units for formulating a domain-general account of the nature of grammar
- the formation, acquisition, and use of constructions is closely related to the domain-general process of chunking, by which bits of experience that are repeatedly associated are repackaged into a single unit
The development of the schematic portions of constructions is based on item-specific, similarity-based categorization
- = another domain-general cognitive ability
Constructions are particularly appropriate for exemplar models
- surface based and can emerge from the categorization of experienced utterances
- store both specific instances of constructions and allow for the abstraction of a more generalized representation

Categorization: exemplar categories

most important property of constructions (p. 78-79)

description between specific lexical items and specific grammatical structures

↳ lexical items (p. 79)

contribute to the meaning of the construction
help determine its function and distribution in discourse
lexical items in a slot: constitute a category based primarily on semantic features

prototype effects (p. 79)

derive from graded category membership
some exemplars are central members of the category while others are more marginal
the mechanisms of exemplar categorization give rise naturally to prototype effects (Medin and Schaffer 1978)
two categorisation dimensions: similarity and frequency

For one thing, the fact that exemplars contain full detail of the percept (whether it be a bird or an utterance) allows for categorization by a number of features, not just those that are contrastive. For instance, a more prototypical bird is small – the size of a sparrow or a robin – while large birds are less prototypical, even though size is not a distinguishing feature of birds. (p. 79)

Given that constructions are conventional linguistic objects and not natural objects that inherently share characteristics, it seems that frequency of occurrence might significantly influence categorization in language. Considering also that using language is a matter of accessing stored representations, those that are stronger (the more frequent ones) are accessed more easily and can thus more easily be used as the basis of categorization of novel items. Because of this factor, a highfrequency exemplar classified as a member of a category is likely to be interpreted as a central member of the category, or at least its greater accessibility means that categorization can take place with reference to it. (p. 89-90)

Bybee says that she will give evidence. I guess this will be like lexical recognition tasks

Incoming exemplars are placed in semantic space closer to or farther from strong exemplars depending upon their degree of similarity. Categorization is probabilistic along the two dimensions. On some occasions categorization can be driven by similarity to a member of lesser frequency if there is greater similarity to this less frequent member (Frisch et al. 2001). However, the probabilistic interaction of frequency and similarity will result in a category whose central member is the most frequent member. (p. 80)

Dimensions upon which constructions vary: degree of fixedness ⟷ schematicity in the slots of constructions

schematicity (p. 80)

how variable is the design of a construction?
is it fully fixed? are there variable slots?

slot-fillers (p. 80)

can be either very restricted or very broad
usually roughly synonymous, or related

highly schematic categories (p. 81)

i.e. grammatical categories at the level of NOUN or VERB
some schematic constructions refer to these highly schematic generalized categories

Prefabs as the centres of categories

restrictive nature of slots (p. 81)

semantic
distribution is clustered around a highly frequent exemplar
- this exemplar could be considered a prefab

Example of this distribution of slot Y for *drive X Y*
slot-filler	frequency
crazy	25
nuts	7
mad	4
up the wall	2
out of my mind	1
over the edge	1
Salieri-mad	1

(p. 81)

↳ the more frequent member serves as the central member of the category and that new expressions tend to be formed by analogy with the more frequent member

Why would crazy be the adjective that leads the march in this case? It is the most frequent adjective in this semantic domain. It is less serious than mad in its ‘insane’ meaning, because for American speakers crazy does not necessarily indicate a clinical condition and so it is more appropriate to the hyperbolic use.

(p. 82-83)

Prototype categories: ‘become’ verbs in Spanish

Failures of necessary and sufficient conditions

abstract analyses (p. 84)

strict necessary and sufficient conditions
an item either belongs or does not belong to the category

↕

exemplar categorisation (p. 84)

members of categories can be graded with respect to their centrality / marginality
local comparisons of incoming items with established items, taking into account both similarity on various dimensions and frequency of occurrence
⇒ items form close, local relationships wherever possible

More local categorization

quedarse-survey (p. 85)

seems more likely that rather than accessing a highly abstract feature, speakers rely on more local comparisons
higher frequency means greater accessibility, the more frequent adjectives tend to serve as the basis for such analogy more often

Similarity in categorization

Four categories

central: immóvil

1. synonyms / near-synonyms

e.g. parado ‘stopped, standing’

2. metaphors that result in similar meanings

e.g. de piedra ‘of stone’

3. hyperbolic expressions

e.g. paralizado ‘paralyzed’ (with the meaning ‘motionless’)

4. shared features

items that share a feature, but add other features
e.g. atrapado ‘trapped’ (indicates motionless, but attributed to some restraining entity)

5. socially-informed inferential associations

bueno ‘good’, rico ‘rich’, famoso ‘famous’, and fuerte ‘strong’

Multiple clusters in constructional categories

clusters (p. 87)

there can be multiple clusters in the same slot!
not specifically one cluster per slot

The role of the most frequent member of the category

diachronic development (p. 90)

seems to emanate outwardly from the central member of a category
conventionalized expressions can develop through a few repetitions and set up a conventionalized way to talk about a situation
then variations on this theme → create a category.

Family resemblance structure

Categories which are more schematic

So far we have examined highly focused categories that are organized around a central member and show high degrees of similarity among the members. These would be less schematic categories, due to their narrow range. But other relationships among the items that occur in a position in a construction are possible as well. Exemplar learning allows categories of various sorts. Some categories are much more schematic and do not have a central high-frequency member. Others do have a high-frequency member but do not show expansion on the basis of that member. (p. 91)

Productivity

productivity (p. 94)

the likelihood that a construction will apply to a new item
a property of the category or categories formed by the open positions in a construction

degree of productivity for a slot (p. 94)

different for each lexical slot
within the same construction, some slots are very open, while others are quite closed

Centrality of membership is not autonomy

Increasing autonomy, which creates a new construction, has been discussed in Bybee 2003b and 2006a. Of relevance for the current discussion is the fact that when a particular instance of a construction – that is, a construction with a particular lexical item – becomes highly frequent, it is processed as a unit. As we saw in Chapter 3, the more often a sequence is processed directly as a unit, the less likely it is to activate other units or the construction to which it belongs and the more likely it is to lose its analysability. At the same time, use in particular contexts contributes to shifts in meaning, which decrease compositionality and make the former exemplar of a construction move away from its source. For example, the be going to construction arose from a purpose clause construction in which any verb could occupy the position go now occupies. Because of the semantic generality of go, it happened to be the most frequent movement verb in the purpose construction. Because of its use in context, one could infer a sense of intention to do something from it, and this became part of its meaning. As a result of its frequent access as a unit and the semantic change due to inferences in context, subject be going to verb has become a new construction independent of the purpose construction from which it arose. (p. 96)

Problems with collostructional analysis

What is collostructional analysis?

collostuctional analysis (p. 97)

a method for analysing the distribution of lexemes in constructions for the purpose of addressing the meaning of constructions
(Stefanowitsch and Gries 2003)

how? (p. 97)

computational methods are used to determine which lexemes are most ‘attracted’ to constructions/which are most ‘repelled’

The researchers developing this method feel that it is important to take into account the overall token frequency of a lexeme in determining how expected it is in a construction, as well as the lexeme’s frequency in the construction. Thus a lexeme with an overall high token count will be judged as less attracted to a construction than one with a low frequency, all other things being equal. In addition, the calculation takes into account the lexeme’s frequency in the construction relative to other lexemes that appear in the construction. The final and fourth factor is the frequency of all the constructions in the corpus.

(p. 97)

Bybee’s issues with collostructional analysis

In the calculation, high overall token frequency of a lexeme detracts from its Collostructional Strength. The stated reasoning is to control for general frequency effects: in order for a lexeme to have high Collostructional Strength it must occur in the construction more often than would be predicted by pure chance (Gries et al. 2005: 646). (p. 97)

Every lexeme was chosen by a speaker in a particular context for a particular reason.
It is entirely possible that the factors that make a lexeme high frequency in a corpus are precisely the factors that make it a central and defining member of the category of lexemes that occurs in a slot in a construction (see sections 5.7 and 5.11).

(p. 97)

(about solo ‘alone’)

Its highly general meaning makes it frequent in the corpora and it is also this general meaning that makes it a central member of the category of adjectives occurring in this construction. So in this case, Collostructional Analysis may give the wrong results, because a high overall frequency will give the word solo a lower degree of attraction to the construction according to this formula. (p. 98, emphasis mine)

The corpusbased analysis of Bybee and Eddington takes the most frequent adjectives occurring with each of four ‘become’ verbs as the centres of categories, with semantically related adjectives surrounding these central adjectives depending on their semantic similarity, as discussed above. Thus our analysis uses both frequency and semantics. Proponents of Collostructional Analysis hope to arrive at a semantic analysis, but do not include any semantic factors in their method. Since no semantic considerations go into the analysis, it seems plausible that no semantic analysis can emerge from it. (p. 98)

aaaaaaaaaaaaaaaaaaaaa Firth draait zich om in zijn graf

(p. 100-101)

First, observe that the adjectives that occurred in the constructions with the highest frequency have the highest Collostructional Strength and also have high ratings for acceptability. For these cases, Collostructional Strength and mere frequency make the same predictions.

For the low-frequency adjectives, however, the experiment revealed, as Bybee and Eddington had predicted, a difference between lowfrequency adjectives that were semantically similar to the high-frequency those that were not. This turned out to be quite significant in the experiment with the low-frequency, semantically related adjectives garnering judgements almost as high as the highfrequency adjectives. In contrast, Collostructional Analysis treats all of the adjectives that occurred with low frequency in the construction the same, giving them very low scores. Of course, the Collostructional Analysis cannot make the distinction between semantically related and unrelated since it works only with numbers and not with meaning. Thus, for determining what lexemes are the best fit or the most central to a construction, a simple frequency analysis with semantic similarity produces the best results.

(p. 100)

reasonable interpretation of the results of the Bybee and Eddington corpus study and experiment is that lexemes with relatively high frequency in a construction are central to defining the meaning of the construction (Goldberg 2006) and serve as a reference point for novel uses of the construction. If this interpretation is correct, then the frequency of the lexeme in other uses is not important.

Gries and colleagues argue for their statistical method but do not propose a cognitive mechanism that corresponds to their analysis. By what cognitive mechanism does a language user devalue a lexeme in a construction if it is of high frequency generally? This is the question Collostructional Analysis must address. (p. 100)

Greater degrees of abstraction

Where do constructions come from? Synchrony and diachrony in a usage-based theory

Diachrony as part of linguistic theory

Grammaticalisation

grammaticalisation (p. 106)

the process by which a lexical item or a sequence of items becomes a grammatical morpheme, changing its distribution and function in the process (Meillet 1912, Lehmann 1982, Heine and Reh 1984, Heine, Claudi and Hünnemeyer 1991, Hopper and Traugott 2003)
e.g. ENG going to ⇒ gonna

grammaticalisation of lexical items (p. 106)

takes place within particular constructions
creates new constructions

Thus going to does not grammaticalize in the construction exemplified by I’m going to the gym but only in the construction in which a verb follows to, as in I’m going to help you.

(p. 106)

How grammaticalization occurs

origin of grammaticalization (p. 107)

language use
involves the process by which a particular lexical instance of a construction (go in the purpose construction) becomes autonomous from the other instances of the construction
this process of course includes the loss of analysability and compositionality

effects (p. 107)

new chunks
phonetic changes triggered by increased frequency
semantic and pragmatic changes as a result of the contexts in which the emerging construction is used

1. chunking (p. 108)

a particular instance of a construction becomes a chunk as a result of repetition
sequences involved in the chunk undergo phonetic reduction
this reduction can be extreme, especially under high frequencies
- due to the automatisation of the articulatory gestures in these sequences

2. loss of analysability

as a result of chunking, the internal units of the grammaticalising expression become more opaque

3. loss of specific meaning / “bleaching”

components of meaning appear to be lost
gonna no longer indicates movement in space; will no longer indicates ‘wanting to’; can no longer means ‘know’ or ‘know how to’ in all instances; a/an is still singular, but does not explicitly specify ‘one’

Through grammaticalization we see how the grammar of a language can arise just as structure arises in a complex adaptive system. The mechanisms operating in real time as speakers and listeners use language, repeated over and over again in multiple speech events, lead to gradual change by which grammatical morphemes and their associated constructions emerge. The lexical material which consists of both form and meaning is molded into constructions which are conventionalized, repeated and undergo further change in both form and meaning. (p. 110)

increases in frequency trigger their operation, while at the same time the output of these processes (semantically more generalized meanings or a wider applicability due to inferences) leads to further frequency increases (Bybee 2009a). (p. 113)

The explanatory power of grammaticalization

Criticisms of grammaticalisations: unidirectionality and grammaticalisation theory

The problem here is of course that the assumption that language can only change during acquisition is incorrect. It is worth noting that this claim is frequently made by researchers whose empirical research does not actually address this question (Janda, Newmeyer). In the next section we address the issue of child-based language change directly. For now note that it is the generativist view of grammar as discrete and unchanging in the adult, that makes this assumption necessary and which thus denies the striking unidirectionality of grammaticalization change. In contrast, if usage is the basis of grammar and change in the grammar, then there is no a priori reason why change cannot occur over an adult’s lifetime. Given that the mechanisms that propel the changes encompassed by grammaticalization are operative in all generations, there is no reason to doubt that change can be unidirectional. (p. 114)

The source of change: language acquisition or language use?

Given that in structural and generative theories grammatical structures are discrete and independent of meaning and use, change must be regarded as an anomaly. The source of change cannot reside in usage or the grammar itself, and thus it has been proposed in these theories that change in the grammar can only come about during its transmission across generations. While many writers assume that the child language acquisition process changes language (Halle 1962, Kiparsky 1968, Lightfoot 1979 and many others both earlier and later; see Janda 2001 for more references), empirical evidence that this is actually the case is still lacking (Croft 2000). (p. 115)

However, Slobin notes that children start with the concrete notions and those most anchored in the present because these notions are cognitively the most simple, natural and accessible. Similarly, in diachrony, the most concrete notions often constitute the starting points for grammaticalization because the material the process works on comes from the basic lexicon – concrete nouns such as body parts and highly generalized verbs such as be, have and go. Thus the parallel here between ontogeny and phylogeny is the correspondence between two processes that may be only superficially similar. (p. 116)

Computer simulations of language change notes